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Abstract — Multiple-input multiple-output (MIMO) broadcast 
channels (BCs) (MIMO-BCs) with perfect channel state infor- 
mation (CSI) at the transmitter are considered. As joint user se- 
lection (US) and vector precoding (VP) (US-VP) with zero-forcing 
transmit beamforming (ZF-BF), US and continuous VP (CVP) 
(US-CVP) and data-dependent US (DD-US) are investigated. The 
replica method, developed in statistical physics, is used to analyze 
the energy penalties for the two US-VP schemes in the large- 
system limit, where the number of users, the number of selected 
users, and the number of transmit antennas tend to infinity with 
their ratios kept constant. Four observations are obtained in the 
large-system limit: First, the assumptions of replica symmetry 
(RS) and 1-step replica symmetry breaking (1RSB) for DD-US 
can provide acceptable approximations for low and moderate 
system loads, respectively. Secondly, DD-US outperforms CVP 
with random US in terms of the energy penalty for low-to- 
moderate system loads. Thirdly, the asymptotic energy penalty of 
DD-US is indistinguishable from that of US-CVP for low system 
loads. Finally, a greedy algorithm of DD-US proposed in authors' 
previous work can achieve nearly optimal performance for low- 
to-moderate system loads. 

I. Introduction 

Multiple-input multiple-output (MIMO) broadcast channels 
(BCs) (MIMO-BCs) are a model for the downlink of multiuser 
MIMO systems. The capacity region of the MIMO-BCs with 
perfect channel state information (CSI) at the transmitter has 
been shown to be achieved by dirty-paper coding (DPC) |Q]- 
J4|, which is a coding scheme to pre-cancel inter-user inter- 
ference at the transmitter side ||5]. Since DPC is infeasible 
in terms of complexity, it is an important research topic to 
construct a suboptimal scheme that can achieve an acceptable 
tradeoff between performance and complexity. 

Zero-forcing transmit beamforming (ZF-BF) is a naive 
approach for eliminating inter-user interference at the transmit- 
ter [6 1, [7|. A drawback of ZF-BF is that the energy required 
for pre-cancellation of inter-user interference, called energy 
penalty in this paper, diverges as the system load increases. 
An increase of the energy penalty results in a degradation of 
the receive signal-to-noise ratio (SNR). 

User selection (US) with ZF-BF llHl. Il9l is a promising 
approach when the number of users is much greater than 



the number of transmit antennas. Interestingly, it was proved 
that a greedy algorithm for US with ZF-BF can achieve the 
sum capacity of the MIMO-BC when only the number of 
users tends to infinity [9|, [ 10 1 . This result is because it is 
possible to select a finite subset of users who have almost 
orthogonal channel vectors in that limit. As the number of 
transmit antennas increases, it becomes difficult to select such 
a subset of users, since the number of selected users should 
be increased to improve the throughput. This implies that US 
with ZF-BF is suboptimal in the situation where the number 
of transmit antennas is comparable to the number of users. 
Such a situation is becoming practical ifTTI . The goal of this 
paper is to construct a precoding scheme that works well in 
that situation. 

Vector perturbation [12] or vector precoding (VP) [fi"3l 
is a sophisticated precoding scheme suited for the situation 
where the number of transmit antennas is comparable to the 
number of users. In VP, the data vector is modified to take 
values in a relaxed alphabet [12], [ 13 1. This relaxation reduces 
the energy penalty without degrading the minimum distance 
between the data symbols. As relaxed alphabets, a lattice- 
type alphabet [12], [14| and a continuous alphabet [13] were 
proposed. In this paper, VP schemes with the lattice-type and 
continuous alphabets are referred to as "lattice VP (LVP)" 
and "continuous VP (CVP)," respectively. The search for a 
vector to minimize the energy penalty reduces to a convex 
optimization problemj for CVP, while the search problem 
for LVP is NP-hard. However, CVP might be still hard to 
implement since the convex optimization has to be solved 
every time slot. The goal of our research is to propose a more 
practical precoding scheme. 

We propose joint US and VP (US-VP), and analyze the 
performance in the large-system limit, in which the number of 
transmit antennas TV, the number of users K, and the number 
of selected users K tend to infinity with the ratios a — K/N 
and k — K/K kept constant. In this paper, an — K/N is 



1 The problem is non-convex for joint user selection and CVP considered 
in this paper. 



referred to as the system load. 

Data-dependent US (DD-US) proposed in ifTBll is regarded 
as a special case of US-VP, i.e. as US-VP with the original 
alphabet as the relaxed alphabet. DD-US takes into account 
the data symbols, along with the channel vectors, to reduce 
the energy penalty, as VP does. Furthermore, DD-US can be 
implemented with a suboptimal greedy algorithm 03]. 

The large-system analysis presented in this paper is based 
on the non-rigorous replica method, developed in statistical 
physics |[T6ll . 1(171 . The replica method is a powerful tool 
for analyzing the large-system performance of MIMO sys- 
tems |fl3ll , |[T8l , fl9l . Several results based on the replica 
method have been justified rigorously. See ||20l - ll22l for the 
details. 

II. MIMO Broadcast Channel 

We consider a Gaussian MIMO-BC with perfect CSI at 
the transmitter, which consists of a base station with N 
transmit antennas and K receivers (users) with one receive 
antenna. The coherence time T c is assumed to be sufficiently 
longS Let j/fct e C denote the received signal for user k 
in time slot t (t — 0, 1, . . . ,T C — 1). The received vector 
Vt = (Vi,t, ■■-, yK,t) T in time slot t is given by 

y t = -=Hu t + n t , n t ~CAf{0,N I K ), (1) 

In (Q]i, u t = (wi jt , . . . , UN.t) T £ C^ denotes a transmit vector 
in time slot t, defined in the next section. The (k, n)-element of 
the channel matrix H £ C K x N represents a complex channel 
gain between the nth transmit antenna and the fcth user. The 
energy penalty £ is defined as the time average of the power 
of the transmit vectors 



£ = ^Ekii 2 . 



(2) 



Thus, the prefactor £ -1 / 2 in (Q} implies that the transmit SNR 
is constrained to 1/Nq. 

We assume that H is known to the transmitter, and that 
H has mutually independent circularly symmetric complex 
Gaussian entries with variance 1/N, These idealized assump- 
tions allow us to calculate the energy penalty analytically. For 
simplicity, quadrature phase shift keying (QPSK) is used, and 
power allocation is not considered. 

III. Precoding 
A. Zero-Forcing Transmit Beamforming 

We start with the conventional ZF-BF [7|, assuming K < 
N. Let x t = (xi t t, ■ ■ ■ , XK,t) T denote the QPSK data symbol 
vector with unit power in time slot t. The transmit vector u t 
is linear-precoded as follows: 



u f = H n {HH n )- l x t . 



(3) 



2 The base station may utilize the reciprocal channel to obtain CSI in 
practice. In order to attain accurate CSI, the coherence time T c should be at 
least larger than the number of users K. In this paper, the limit T c — > oo is 
implicitly taken before the large-system limit. 
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Substituting (01 into ([TJ implies that inter-user interference is 
eliminated completely. More precisely, the MIMO-BC (Q3 is 
decomposed into single-user Gaussian channels with receive 
SNR 1/{£N ). The drawback of ZF-BF is that the energy 
penalty (O in T c — >• oo 

^r^xflHH*)-^ -> ^{(ffff*)- 1 } 
c t=o 

(4) 
diverges as a = K/N — > 1 ll23ll . Consequently, the receive 
SNR tends to zero as a — > 1. As a solution to circumventing 
the divergence of the energy penalty, VP has been considered. 

B. Vector Precoding 

In VP with ZF-BF, each data symbol Xk,t is modified to take 
values in a relaxed alphabet X Xk t C C The relaxed alphabets 
for different data symbols must be disjoint, i.e. X x r\X% = $ 
for all x ^ x. Since information is conveyed by the relaxed 
alphabet X Xh t , the receiver detects the relaxed alphabet X Xk t . 
See lfl2ll for the details. If one uses the vector x t to minimize 
the energy penalty (ffji as the modified vector, the transmit 
vector is given by 



with 



u t = H^HH^xt, 



argmin xf {H H H ) 



(5) 



" x x t . 



x t = argmin x' t '{±l±l") "x t . (6) 

*t6llE.i*. fci , 

Note that the vector © to minimize each instantaneous power 
\\u t \\ 2 minimizes the energy penalty (ffji for any T c . 

Example 1 (LVP). In LVP H2\l . two-dimensional (one- 
complex-dimensional) square lattices are used as the relaxed 
alphabets, 



X,. 



V2 



Z + 9t[x] 



v/2 



Z + 9ffd 



(7) 



for 3t[x],Q[x] G {±1/V2}. It is infeasible in terms of the 
complexity to find the optimal vector (0 for LVP. 

Example 2 (CVP). In CVP A73]/ , the original alphabets are 
relaxed to continuous disjoint alphabets, 



tX. r — <t,t: 



iX^ 



(8) 



with 



X x — 



(9) 



[x,oo) for x = l/v2 
(— oo, x] /or i = -l/v / 2. 

The minimization ©/or CVP reduces to a convex optimization 
problem, so that an efficient algorithm can be used to solve 
©. However, CVP might be still hard to implement since the 
convex optimization needs to be solved every time slot. 

The point of VP is that the modified vector x t depends 
on the channel matrix H. Consequently, the energy penalty 
T c l Ef=o 1 xfiHH^yt-Xt for VP never tends to the right- 
hand side (RHS) of (@]l in T c — > oo. In fact, the energy penalty 
for VP was shown to be bounded in the limit a —t 1 after 
taking the large-system limit |19|. 
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C. Joint US and VP 

US-VP is performed every T time slots. The block length 
T should not be confused with the coherence time T c . We 
write the set of selected users in the ith block of US-VP as 

K.i with \JCi\ = K, for i = 0, 1, The base station sends 

the QPSK data symbols {x k ,iT+t ■ t — 0, . . . , T - 1} to the 
selected user k € JCi in block i. The transmit vector u t (t G 
[iT, (i + 1)T — 1]) in block i is generated as 



u t =H^(H Ki HlX 1 *K t ,t, 



(10) 



where the set of selected users JCi C JC = {1, . . . , K} and the 
modified vectors {xfc u t & Tikeic ^h t\ minimize the energy 
penalty (fJJ: They are given by (flTT l at the top of this page. 

It is difficult to solve the minimization ( fTTT i for US-LVP 
and US-CVP, which are defined as US-VP with © and ©, 
respectively. Instead, we focus on DD-US. 

Example 3 (Data-Dependent US). DD-US is defined as US- 
VP with the original alphabet as the relaxed alphabet, i.e. 
X x = {x}. Since DD-US is performed every T time slots, 
DD-US may be more suitable for implementation. A greedy 
algorithm for DD-US with ZF-BF proposed in [15] allows us 
to solve the minimization ([77} efficiently and approximately. 

Let us discuss the relationship between DD-US and con- 
ventional US, the latter of which selects a subset of users 
JC C C JC to minimize the energy penalty Tr{(Jf/c c iJ K: ,) -1 }- 
The energy penalty of the conventional US is obviously 
larger than that of DD-US for any T. In DD-US, the set of 
selected users Ki is determined on the basis of an appropriate 
tradeoff (fTTT i between the orthogonality of the channel row 
vectors and the direction of the data symbols. The performance 
of DD-US degrades as T increases, since it becomes difficult 
to select those data symbols with good direction. Thus, the 
energy penalty of DD-US in T — > oo can be regarded as a 
lower bound on the energy penalty for the conventional US. 

Each user has to detect whether he/she has been selected 
in each block of US. In order for each user to blind-detect it, 
the data symbols for non-selected users should be discarded 
at the transmitter side lfT31 . Substituting (TlOb into (fTJ yields 



y k ,t = —7= \sk,%Xk,t + (1 - s k:i )h k u t \ 



nk,t, 



(12) 



for any k. In ( TT2l . Xk,t € X Xk t denotes the modified data 
symbol corresponding to the original data symbol x k ,t- The 
variable Sk,i G {0, 1} indicating whether user k has been 
selected in block i is defined as 

1 keJCi 
fc £ Id. 



Sfe,, 



(13) 



Furthermore, h k G C lxJV denotes the fcth row vector of the 
channel matrix H. Note that the indices t of y k ,t and Xk,t i n 



(fT2l are identical to each other, since the data symbols for the 
non-selected users k £ JCi have been discarded. This simplifies 
the detection of (|T3j lfl5l . 

It is easy for user k to blind-detect one variable Sk,i from 
the T observations {y k ,t} in each block. Using the decision- 
feedback of Xk,t from the decoder improve the accuracy of 
detection |15|. In order to reduce the energy penalty, small 
T should be used. On the other hand, too small T makes it 
difficult to detect. As one option, dozens of time slots should 
be used as the block length T. For example, the energy loss 
due to detection errors is at most 0.2-0.5 dB for T = 16 ifTBI . 

IV. Main Results 

The replica method is used to analyze the energy penalty 
for US-VP in the large-system limit, where N, K, and K 
tend to infinity with the ratios a = K/N and k = K/K kept 
constant. The energy penalty is expected to be self-averaging 
in the large-system limit: It converges in probability (or almost 
surely) to the expected one in the large-system limit. Thus, we 
focus on the average energy penalty. 

We consider the assumptions of replica symmetry (RS) 
and of 1-step replica symmetry breaking (1RSB) [16], [17|. 
Roughly speaking, the RS assumption corresponds to the as- 
sumption that the solution to the minimization (fTTT i is unique. 
On the other hand, 1RSB is the simplest assumption for the 
case in which there are many solutions. It is empirically known 
that the 1RSB assumption can provide a good approximation 
for the energy penalty [ T9l . 

Without loss of generality, we focus on the first block of US, 
i.e. t = 0, . . . , T — 1, Before presenting the main results, we 
summarize several definitions. Let us define a random variable 
E k {q) as 

T-l 



E k (q) = ^J2 



mm 

T — ' x u t €= X v , 

t=o • s 



\xk,t - Vv z k,t\ 2 



(14) 



with z kt ~ CA/"(0, 1). We write the cumulative distribution 
function Prob(E k (q) < x) for the random variable (TBI and 
its inverse function as Ft(x;q) and F^ 1 {x\q), respectively. 
We define two quantities /j, K! t(q) and <j 2 K T (q) as 



M«,r(g) 



F T (x; q)dx, 



(15) 



rF-\K-q) nF-\ K ;q) 

< j I,t( ( 1)= / [FT(™.m(x,y);q) 

Jo Jo 

-F T (x;q)F T (y;q)]dxdy, (16) 

respectively. These quantities are associated with the mean and 
variance of 

1 k 



fe=i 



where {-EV&) (q)} are the order statistics of (H4V i.e. E^ (q) < 

■ ■ ■ < E (K) ( q ) m. 

Proposition 1. Under the RS assumption, the average energy 
penalty per selected user E[£]/K for US-VP converges to 
qo/(an) in the large-system limit, which is the solution to 
the fixed-point equation 



go = ap, K , T (qo)- 



(18) 



Proposition 2. Under the 1RSB assumption, the average en- 
ergy penalty per selected user M[£]/K for US-VP converges to 
qi/(a.K) in the large-system limit, which satisfies the coupled 
fixed-point equations 



g(qi,x) =o, 

d 
ox 



(19) 
(20) 



for some x > 0, with 

g(qi,x) = in 



l + £ 



- M<r(gi) ~ 
X \ 



Tal T (qi) 



2X 



(21) 



See ||25l Appendices C and D] for the details of the 
derivations. The central limit theorem implies that the random 
variable ( TT4T > converges in law to a Gaussian random in 
T — >• oo. It is straightforward to find that ([TBI reduces to 



lim 

T-KX 



Mk,t(?) = «JE 



min \xt 



Vvz t \ 2 



(22) 



It is worth noting that the energy penalty of DD-US under the 
RS assumption is explicitly given by 



m 

k 



i 



i 



1 - a« 1 - K/N 



(23) 



as the block length T tends to infinity. The energy penalty < 
under the RS assumption is equal to that for ZF-BF with 
random US (RUS), in which K users are selected uniformly 
and randomly. Similarly, the energy penalty for US-CVP under 
the RS assumption is also equal to that for RUS and CVP 
(RUS-CVP) in T ->• oo. Since the energy penalty for DD- 
US in T — » oo is a lower bound on that for conventional US 
with ZF-BF, one may conclude that conventional US makes no 
sense in the large-system limit. However, we cannot reach this 
conclusion only from those observations. The RS solutions are 
approximations for the true energy penalty in the large-system 
limit. In order to investigate whether the conclusion is correct, 
the assumption of higher-step RSB should be considered. 

V. Numerical Results 

DD-US is compared to US-CVP, ZF-BF with RUS, and 
RUS-CVP iTPJll in terms of the average energy penalty. Note 
that the block length T is kept finite, while the coherence 
time T c is implicitly assumed to tend to infinity. We found 
that Propositions Q] and [2] for US-LVP provide unreliable 
approximations for the energy penalty, so that US-LVP is 



ZF-BF with RUS 
RUS-CVP [101 
DD-US (1RSB) 
DD-US (RS) 
US-CVP(IRSB) 
US-CVP (RS) 
greedy [12] (K=512) 
greedy [12] (K=256) 
greedy [12] (Jt=128) 




Fig. 1. E[£]/K versus an = K/N for a = 4 and T = 20. 



not plotted. The assumption of higher-step RSB is required 
to obtain a good approximation for US-LVP. 

Figure Q] shows the average energy penalties per selected 
user of the four schemes for T — 20 in the large-system limit. 
The RS and 1RSB solutions are plotted by dashed and solid 
lines, respectively. The energy penalties for a greedy algorithm 
of DD-US Q3) are also shown for K = 128, 256, 512. The 
1RSB assumptions are obviously unreliable for small an, since 
the energy penalties are larger than those for ZF-BF with 
RUS and RUS-CVP, which correspond to upper bounds. We 
can observe four results: First, the gap between the RS and 
1RSB solutions for US-CVP is small for moderate-to-large 
an, while the gap for DS-US is for moderate an. Secondly, as 
the system size increases, the energy penalties for the greedy 
algorithm of DD-US [15| get closer from below to the RS 
solution for small an and to the 1RSB solution for moderate 
an, respectively. These results imply that the RS and 1RSB 
solutions for DD-US can provide acceptable approximations 
for small an and for moderate an, respectively. Thirdly, DD- 
US achieves almost the same energy penalty as US-CVP for 
low system loads. This result can be understood as follows: q 
in (TT~4b is small for low system loads, so that the magnitude of 
^JqlR[zt] {y/q^s[z t }) is smaller than the magnitude of the data 
symbol with high probability. Thus, the continuous relaxation 
of the alphabet (JHJ makes no sense in this region of the system 
load. Finally, we find that DD-US outperforms RUS-CVP in 
terms of the energy penalty except for high an. For an = 0.5, 
DD-US can provide a performance gain of 1.8 dB, compared 
to RUS-CVP, which seems to be larger than the energy loss 
due to the detection error at the receiver 11151 . noted in the end 
of Section IIII-CI Note that the energy penalty for RUS-CVP 
gets closer from above to the asymptotic one as the system 
size increases [19|. Thus, the performance gap between DD- 
US and RUS-CVP should be larger for finite-sized systems. 

We next assess the accuracy of the approximations based 
on the RS and 1RSB assumptions for DD-US. Figure [2] shows 
the average energy penalty per selected user versus a for fixed 




Fig. 2. K[£]/K versus a for an = 0.5. 



an = 0.5. For comparison, the energy penalties for the greedy 
algorithm of DD-US [ 15 1 are also plotted. For small a, the RS 
and 1RSB solutions are indistinguishable from each other, so 
that they should provide an accurate approximation of the true 
energy penalty for small a. The gaps between the analytical 
results and the numerical simulations for small a should be 
due to the suboptimality of the greedy algorithm. The 1RSB 
solution for T = 80 exhibits strange behavior: The energy 
penalty must be a monotonically decreasing function of a, 
since large a implies large multiuser diversity. However, the 
energy penalty increases with the increase of a for large a. 

VI. Conclusions 

The energy penalties of DD-US and US-CVP for the 
MIMO-BC have been evaluated in the large-system limit under 
the RS and 1RSB assumptions. We found four observations: 
First, the RS and 1RSB assumptions for DD-US can pro- 
vide acceptable approximations for low and moderate system 
loads, respectively. Secondly, DD-US outperforms RUS-CVP 
in terms of the energy penalty for low-to-moderate system 
loads. Thirdly, the asymptotic energy penalty of DD-US is 
indistinguishable from that of US-CVP for low system loads. 
Finally, a greedy algorithm of DD-US proposed in lfj"5l 
can achieve nearly optimal performance for low-to-moderate 
system loads. These results imply that DD-US can provide a 
good tradeoff between the performance and the complexity for 
low-to-moderate system loads. 

As another method for reducing the energy penalty, it is 
important to investigate regularized ZF-BF or minimum mean- 
squared error (MMSE) precoding. We leave this analysis as 
future work. 
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